Simulate and plot some data from simple ARIMA models. a. Use the following R code to generate data from an AR(1) model with \(\phi_{1} = 0.6\) and \(\sigma^2=1\). The process starts with \(y_1=0\).
ar1 <- function(phi, n = 100L) {
y <- numeric(n)
e <- rnorm(n)
for (i in 2:n) {
y[i] <- phi * y[i - 1] + e[i]
}
tsibble(idx = seq_len(n), y = y, index = idx)
}
- Produce a time plot for the series. How does the plot change as you change \(\phi_1\)?
Some examples of changing \(\phi_1\)
ar1(0.6)
## # A tsibble: 100 x 2 [1]
## idx y
## <int> <dbl>
## 1 1 0
## 2 2 -1.55
## 3 3 -2.56
## 4 4 -1.31
## 5 5 -0.865
## 6 6 -1.76
## 7 7 -1.07
## 8 8 -0.766
## 9 9 -1.70
## 10 10 -1.38
## # ℹ 90 more rows
ar1(0.6) |> autoplot(y) + labs(title=expression(paste(phi, "=0.6")))
ar1(0.95) |> autoplot(y) + labs(title=expression(paste(phi, "=0.95")))
ar1(0.05) |> autoplot(y) + labs(title=expression(paste(phi, "=0.05")))
ar1(-0.65) |> autoplot(y) + labs(title=expression(paste(phi, "=-0.65")))
- Write your own code to generate data from an MA(1) model with \(\theta_{1} = 0.6\) and \(\sigma^2=1\).
ma1 <- function(theta, n = 100L) {
y <- numeric(n)
e <- rnorm(n)
for (i in 2:n) {
y[i] <- theta * e[i - 1] + e[i]
}
tsibble(idx = seq_len(n), y = y, index = idx)
}
- Produce a time plot for the series. How does the plot change as you change \(\theta_1\)?
ma1(0.6) |> autoplot(y) + labs(title=expression(paste(theta, "=0.6")))
ma1(0.95) |> autoplot(y) + labs(title=expression(paste(theta, "=0.95")))
ma1(0.05) |> autoplot(y) + labs(title=expression(paste(theta, "=0.05")))
ma1(-0.65) |> autoplot(y) + labs(title=expression(paste(theta, "=-0.65")))
- Generate data from an ARMA(1,1) model with \(\phi_{1} = 0.6\), \(\theta_{1} = 0.6\) and \(\sigma^2=1\).
arma11 <- function(phi, theta, n = 100) {
y <- numeric(n)
e <- rnorm(n)
for (i in 2:n) {
y[i] <- phi * y[i - 1] + theta * e[i - 1] + e[i]
}
tsibble(idx = seq_len(n), y = y, index = idx)
}
arma11(0.6, 0.6) |> autoplot(y)
- Generate data from an AR(2) model with \(\phi_{1} =-0.8\), \(\phi_{2} = 0.3\) and \(\sigma^2=1\). (Note that these parameters will give a non-stationary series.)
ar2 <- function(phi1, phi2, n = 100) {
y <- numeric(n)
e <- rnorm(n)
for (i in 3:n) {
y[i] <- phi1 * y[i - 1] + phi2 * y[i - 2] + e[i]
}
tsibble(idx = seq_len(n), y = y, index = idx)
}
ar2(-0.8, 0.3) |> autoplot(y)
- Graph the latter two series and compare them.
See graphs above. The non-stationarity of the AR(2) process has led to increasing oscillations
Consider
aus_airpassengers, the total number of passengers (in millions) from Australian air carriers for the period 1970-2011.
- Use
ARIMA()to find an appropriate ARIMA model. What model was selected. Check that the residuals look like white noise. Plot forecasts for the next 10 periods.
aus_airpassengers |> autoplot(Passengers)
fit <- aus_airpassengers |>
model(arima = ARIMA(Passengers))
report(fit)
## Series: Passengers
## Model: ARIMA(0,2,1)
##
## Coefficients:
## ma1
## -0.8963
## s.e. 0.0594
##
## sigma^2 estimated as 4.308: log likelihood=-97.02
## AIC=198.04 AICc=198.32 BIC=201.65
fit |> gg_tsresiduals()
fit |> forecast(h = 10) |> autoplot(aus_airpassengers)
- Write the model in terms of the backshift operator.
## year
## 4.307764
\[(1-B)^2y_t = (1+\theta B)\varepsilon_t\] where \(\varepsilon\sim\text{N}(0,\sigma^2)\), \(\theta = -0.90\) and \(\sigma^2 = 4.31\).
- Plot forecasts from an ARIMA(0,1,0) model with drift and compare these to part a.
aus_airpassengers |>
model(arima = ARIMA(Passengers ~ 1 + pdq(0,1,0))) |>
forecast(h = 10) |>
autoplot(aus_airpassengers)
- Plot forecasts from an ARIMA(2,1,2) model with drift and compare these to part b. Remove the constant and see what happens.
aus_airpassengers |>
model(arima = ARIMA(Passengers ~ 1 + pdq(2,1,2))) |>
forecast(h = 10) |>
autoplot(aus_airpassengers)
aus_airpassengers |>
model(arima = ARIMA(Passengers ~ 0 + pdq(2,1,2)))
## # A mable: 1 x 1
## arima
## <model>
## 1 <NULL model>
- Plot forecasts from an ARIMA(0,2,1) model with a constant. What happens?
aus_airpassengers |>
model(arima = ARIMA(Passengers ~ 1 + pdq(0,2,1))) |>
forecast(h = 10) |>
autoplot(aus_airpassengers)
## Warning: Model specification induces a quadratic or higher order polynomial trend.
## This is generally discouraged, consider removing the constant or reducing the number of differences.
The forecast trend is now quadratic, and there is a warning that this is generally a bad idea.
For the United States GDP series (from
global_economy):
- If necessary, find a suitable Box-Cox transformation for the data;
us_economy <- global_economy |>
filter(Code == "USA")
us_economy |>
autoplot(GDP)
lambda <- us_economy |>
features(GDP, features = guerrero) |>
pull(lambda_guerrero)
lambda
## [1] 0.2819443
us_economy |>
autoplot(box_cox(GDP, lambda))
It seems that a Box-Cox transformation may be useful here.
- fit a suitable ARIMA model to the transformed data using
ARIMA();
fit <- us_economy |>
model(ARIMA(box_cox(GDP, lambda)))
report(fit)
## Series: GDP
## Model: ARIMA(1,1,0) w/ drift
## Transformation: box_cox(GDP, lambda)
##
## Coefficients:
## ar1 constant
## 0.4586 118.1822
## s.e. 0.1198 9.5047
##
## sigma^2 estimated as 5479: log likelihood=-325.32
## AIC=656.65 AICc=657.1 BIC=662.78
- try some other plausible models by experimenting with the orders chosen;
fit <- us_economy |>
model(
arima010 = ARIMA(box_cox(GDP, lambda) ~ 1 + pdq(0, 1, 0)),
arima011 = ARIMA(box_cox(GDP, lambda) ~ 1 + pdq(0, 1, 1)),
arima012 = ARIMA(box_cox(GDP, lambda) ~ 1 + pdq(0, 1, 2)),
arima013 = ARIMA(box_cox(GDP, lambda) ~ 1 + pdq(0, 1, 3)),
arima110 = ARIMA(box_cox(GDP, lambda) ~ 1 + pdq(1, 1, 0)),
arima111 = ARIMA(box_cox(GDP, lambda) ~ 1 + pdq(1, 1, 1)),
arima112 = ARIMA(box_cox(GDP, lambda) ~ 1 + pdq(1, 1, 2)),
arima113 = ARIMA(box_cox(GDP, lambda) ~ 1 + pdq(1, 1, 3)),
arima210 = ARIMA(box_cox(GDP, lambda) ~ 1 + pdq(2, 1, 0)),
arima211 = ARIMA(box_cox(GDP, lambda) ~ 1 + pdq(2, 1, 1)),
arima212 = ARIMA(box_cox(GDP, lambda) ~ 1 + pdq(2, 1, 2)),
arima213 = ARIMA(box_cox(GDP, lambda) ~ 1 + pdq(2, 1, 3)),
arima310 = ARIMA(box_cox(GDP, lambda) ~ 1 + pdq(3, 1, 0)),
arima311 = ARIMA(box_cox(GDP, lambda) ~ 1 + pdq(3, 1, 1)),
arima312 = ARIMA(box_cox(GDP, lambda) ~ 1 + pdq(3, 1, 2)),
arima313 = ARIMA(box_cox(GDP, lambda) ~ 1 + pdq(3, 1, 3))
)
- choose what you think is the best model and check the residual diagnostics;
fit |>
glance() |>
arrange(AICc) |>
select(.model, AICc)
## # A tibble: 16 × 2
## .model AICc
## <chr> <dbl>
## 1 arima110 657.
## 2 arima011 659.
## 3 arima111 659.
## 4 arima210 659.
## 5 arima012 660.
## 6 arima112 661.
## 7 arima211 661.
## 8 arima310 662.
## 9 arima013 662.
## 10 arima312 663.
## 11 arima311 664.
## 12 arima113 664.
## 13 arima212 664.
## 14 arima313 665.
## 15 arima213 666.
## 16 arima010 668.
The best according to the AICc values is the ARIMA(1,1,0) w/ drift model.
best_fit <- us_economy |>
model(ARIMA(box_cox(GDP, lambda) ~ 1 + pdq(1, 1, 0)))
best_fit |> report()
## Series: GDP
## Model: ARIMA(1,1,0) w/ drift
## Transformation: box_cox(GDP, lambda)
##
## Coefficients:
## ar1 constant
## 0.4586 118.1822
## s.e. 0.1198 9.5047
##
## sigma^2 estimated as 5479: log likelihood=-325.32
## AIC=656.65 AICc=657.1 BIC=662.78
best_fit |> gg_tsresiduals()
augment(best_fit) |> features(.innov, ljung_box, dof = 1, lag = 10)
## # A tibble: 1 × 4
## Country .model lb_stat lb_pvalue
## <fct> <chr> <dbl> <dbl>
## 1 United States ARIMA(box_cox(GDP, lambda) ~ 1 + pdq(1, 1, 0)) 3.81 0.923
The residuals pass the Ljung-Box test, but the histogram looks like negatively skewed.
- produce forecasts of your fitted model. Do the forecasts look reasonable?
best_fit |>
forecast(h = 10) |>
autoplot(us_economy)
These look reasonable. Let’s compare a model with no transformation.
fit1 <- us_economy |> model(ARIMA(GDP))
fit1 |>
forecast(h = 10) |>
autoplot(us_economy)
Notice the effect of the transformation on the forecasts. Increase the forecast horizon to see what happens. Notice also the width of the prediction intervals.
us_economy |>
model(
ARIMA(GDP),
ARIMA(box_cox(GDP, lambda))
) |>
forecast(h = 20) |>
autoplot(us_economy)
- compare the results with what you would obtain using
ETS()(with no transformation).
us_economy |>
model(ETS(GDP)) |>
forecast(h = 10) |>
autoplot(us_economy)
Consider the number of Snowshoe Hare furs traded by the Hudson Bay Company between 1845 and 1935 (data set
pelt). a. Produce a time plot of the time series. b. Assume you decide to fit the following model: \[ y_t = c + \phi_1 y_{t-1} + \phi_2 y_{t-2} + \phi_3 y_{t-3} + \phi_4 y_{t-4} + \varepsilon_t, \] where \(\varepsilon_t\) is a white noise series. What sort of ARIMA model is this (i.e., what are \(p\), \(d\), and \(q\))?
- By examining the ACF and PACF of the data, explain why this model is appropriate.
pelt |> gg_tsdisplay(plot="partial")
## Plot variable not specified, automatically selected `y = Hare`
fit <- pelt |> model(AR4 = ARIMA(Hare ~ pdq(4,0,0)))
fit |> gg_tsresiduals()
- The last five values of the series are given below:
| Year | 1931 | 1932 | 1933 | 1934 | 1935 |
|---|---|---|---|---|---|
| Number of hare pelts | 19520 | 82110 | 89760 | 81660 | 15760 |
The estimated parameters are \(c = 30993\), \(\phi_1 = 0.82\), \(\phi_2 = -0.29\), \(\phi_3 = -0.01\), and \(\phi_4 = -0.22\). Without using the
forecastfunction, calculate forecasts for the next three years (1936–1939).
\[\begin{align*} \hat{y}_{T+1|T} & = 30993 + 0.82* 15760 -0.29* 81660 -0.01* 89760 -0.22* 82110 = 2051.57 \\ \hat{y}_{T+2|T} & = 30993 + 0.82* 2051.57 -0.29* 15760 -0.01* 81660 -0.22* 89760 = 8223.14 \\ \hat{y}_{T+3|T} & = 30993 + 0.82* 8223.14 -0.29* 2051.57 -0.01* 15760 -0.22* 81660 = 19387.96 \\ \end{align*}\]
- Now fit the model in R and obtain the forecasts using
forecast. How are they different from yours? Why?
pelt |>
model(ARIMA(Hare ~ pdq(4, 0, 0))) |>
forecast(h=3)
## # A fable: 3 x 4 [1Y]
## # Key: .model [1]
## .model Year Hare .mean
## <chr> <dbl> <dist> <dbl>
## 1 ARIMA(Hare ~ pdq(4, 0, 0)) 1936 N(2052, 5.9e+08) 2052.
## 2 ARIMA(Hare ~ pdq(4, 0, 0)) 1937 N(8223, 9.8e+08) 8223.
## 3 ARIMA(Hare ~ pdq(4, 0, 0)) 1938 N(19388, 1.1e+09) 19388.
Any differences will be due to rounding errors.
The population of Switzerland from 1960 to 2017 is in data set
global_economy.
- Produce a time plot of the data.
swiss_pop <- global_economy |>
filter(Country == "Switzerland") |>
select(Year, Population) |>
mutate(Population = Population / 1e6)
autoplot(swiss_pop, Population)
- You decide to fit the following model to the series: \[y_t = c + y_{t-1} + \phi_1 (y_{t-1} - y_{t-2}) + \phi_2 (y_{t-2} - y_{t-3}) + \phi_3( y_{t-3} - y_{t-4}) + \varepsilon_t\] where \(y_t\) is the Population in year \(t\) and \(\varepsilon_t\) is a white noise series. What sort of ARIMA model is this (i.e., what are \(p\), \(d\), and \(q\))?
This is an ARIMA(3,1,0), hence \(p=3\), \(d=1\) and \(q=0\).
- Explain why this model was chosen using the ACF and PACF of the differenced series.
swiss_pop |> gg_tsdisplay(Population, plot="partial")
swiss_pop |> gg_tsdisplay(difference(Population), plot="partial")
The significant spike at lag 3 in the PACF, coupled with the exponential decay in the ACF, for the differenced series, signals an AR(3) for the differenced series.
- The last five values of the series are given below.
| Year | 2013 | 2014 | 2015 | 2016 | 2017 |
|---|---|---|---|---|---|
| Population (millions) | 8.09 | 8.19 | 8.28 | 8.37 | 8.47 |
The estimated parameters are \(c = 0.0053\), \(\phi_1 = 1.64\), \(\phi_2 = -1.17\), and \(\phi_3 = 0.45\). Without using the
forecastfunction, calculate forecasts for the next three years (2018–2020).
\[\begin{align*} \hat{y}_{T+1|T} & = 0.0053 + 8.47+ 1.64* (8.47 - 8.37) -1.17* (8.37 - 8.28) + 0.45* (8.28 - 8.19) = 8.56 \\ \hat{y}_{T+2|T} & = 0.0053 + 8.56+ 1.64* (8.56 - 8.47) -1.17* (8.47 - 8.37) + 0.45* (8.37 - 8.28) = 8.65 \\ \hat{y}_{T+3|T} & = 0.0053 + 8.65+ 1.64* (8.65 - 8.56) -1.17* (8.56 - 8.47) + 0.45* (8.47 - 8.37) = 8.73 \\ \end{align*}\]
- Now fit the model in R and obtain the forecasts from the same model. How are they different from yours? Why?
global_economy |>
filter(Country == "Switzerland") |>
mutate(Population = Population / 1e6) |>
model(ARIMA(Population ~ 1 + pdq(3, 1, 0))) |>
forecast(h=3)
## # A fable: 3 x 5 [1Y]
## # Key: Country, .model [1]
## Country .model Year Population .mean
## <fct> <chr> <dbl> <dist> <dbl>
## 1 Switzerland ARIMA(Population ~ 1 + pdq(3, 1, 0)) 2018 N(8.6, 0.00013) 8.56
## 2 Switzerland ARIMA(Population ~ 1 + pdq(3, 1, 0)) 2019 N(8.6, 0.001) 8.65
## 3 Switzerland ARIMA(Population ~ 1 + pdq(3, 1, 0)) 2020 N(8.7, 0.0033) 8.73
Any differences will be due to rounding errors.